Feature Engineering in the NLI Shared Task 2013: Charles University Submission Report
نویسندگان
چکیده
Our goal is to predict the first language (L1) of English essays’s authors with the help of the TOEFL11 corpus where L1, prompts (topics) and proficiency levels are provided. Thus we approach this task as a classification task employing machine learning methods. Out of key concepts of machine learning, we focus on feature engineering. We design features across all the L1 languages not making use of knowledge of prompt and proficiency level. During system development, we experimented with various techniques for feature filtering and combination optimized with respect to the notion of mutual information and information gain. We trained four different SVM models and combined them through majority voting achieving accuracy 72.5%.
منابع مشابه
NLI Shared Task 2013: MQ Submission
Our submission for this NLI shared task used for the most part standard features found in recent work. Our focus was instead on two other aspects of our system: at a high level, on possible ways of constructing ensembles of multiple classifiers; and at a low level, on the granularity of part-of-speech tags used as features. We found that the choice of ensemble combination method did not lead to...
متن کاملNAIST at the NLI 2013 Shared Task
This paper describes the Nara Institute of Science and Technology (NAIST) native language identification (NLI) system in the NLI 2013 Shared Task. We apply feature selection using a measure based on frequency for the closed track and try Capping and Sampling data methods for the open tracks. Our system ranked ninth in the closed track, third in open track 1 and fourth in open track 2.
متن کاملA Report on the 2017 Native Language Identification Shared Task
Native Language Identification (NLI) is the task of automatically identifying the native language (L1) of an individual based on their language production in a learned language. It is typically framed as a classification task where the set of L1s is known a priori. Two previous shared tasks on NLI have been organized where the aim was to identify the L1 of learners of English based on essays (2...
متن کاملA study of N-gram and Embedding Representations for Native Language Identification
We report on our experiments with Ngram and embedding based feature representations for Native Language Identification (NLI) as a part of the NLI Shared Task 2017 (team name: NLI-ISU). Our best performing system on the test set for written essays had a macro F1 of 0.8264 and was based on word uni, bi and trigram features. We explored n-grams covering word, character, POS and word-POS mixed repr...
متن کاملA Report on the First Native Language Identification Shared Task
Native Language Identification, or NLI, is the task of automatically classifying the L1 of a writer based solely on his or her essay written in another language. This problem area has seen a spike in interest in recent years as it can have an impact on educational applications tailored towards non-native speakers of a language, as well as authorship profiling. While there has been a growing bod...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013